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1. Introduction 


Nonparametric estimation of unknown densities on partially or totally bounded 
supports, with or without correlation in its multivariate components, is a recurrent 
practical problem. Because of symmetry, the multivariate classical or symmetric 
kernels, not depending on any parameter, are not appropriate for these densities. 
In fact, these estimators give weights outside the support causing a bias in bound¬ 
ary regions. In order to reduce the boundary problem with multivariate symmetric 
kernels as Gaussian, Sain (2002) and recently Zougab et al. (2014) have proposed 
adaptive full bandwidth matrix selection; but the bias does not disappear com¬ 
pletely. Chen (1999, 2000) is one of the pioneers who has proposed, in univariate 
continuous case, some asymmetric kernels (i.e. beta and gamma) whose supports 
coincide with those of the densities to be estimated. Also recently, Libengue (2013) 
investigated several families of these univariate continuous kernels that he called 
univariate associated kernels; see also Kokonendji et al. (2007), Kokonendji and 
Senga Kiesse (2011), Zougab et al. (2012, 2013) for univariate discrete situations. 
This procedure cancels of course the boundary bias; however, it creates a quantity 
in the bias of the estimator which needs reduction; see, for instance, Malec and 
Schienle (2014), Hirukawa and Sakudo (2014) and Igarashi and Kakizawa (2015). 

Several approaches on multivariate kernel estimation have been proposed for 
various processings. Garcia-Portugues et al. (2013) used product of kernels for esti¬ 
mating the different nature of both directional and linear components of a random 
vector. A classical kernel density estimation on the rotation group appropriate for 
crystallographic texture analysis has been investigated by Hielscher (2013). Sym¬ 
metric kernel smoothers with univariate local bandwidth have been studied by 
Gonzalez-Manteiga et al. (2013) for semiparametric mixed effect models. Girard 
et al. (2013) presented frontier estimation with classical kernel regression on high 
order moments. In discrete case, Aitchison and Aitken (1976) provided kernel 
estimators for binary data while Racine and Li (2004) proposed the product of 
them with classical continuous one for smoothing regression on both categorical 
and continuous data. In the same spirit of Racine and Li (2004), Bouerzmarni 
and Rombouts (2010) considered some products of different univariate associated 
kernels in continuous case; i.e. the bandwidth matrix obtained is diagonal. In 
the classical kernels case, Chacon and Duong (2011) and Chacon et al. (2011) have 
shown the importance of full bandwidth matrices for certain target densities. See 
also Hazelton and Marshall (2009) for a support with arbitrary shape. 

The main goal of this work is to introduce the multivariate associated kernels 
with the most general bandwidth matrix. In other words, the support of the sug¬ 
gested associated kernels coincides to the support of the densities to be estimated; 
also, the full bandwidth matrices take into account different correlation structures 
in the sample. Note that, a full bandwidth matrix significantly improves some 
complex target densities (e.g. multimodal); see Sain (2002). In high dimensions. 
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the computational choice of this full bandwidth matrix needs some special tech¬ 
niques. We can refer to Chacon and Duong (2010, 2011) and Chacon et al. (2011) 
for classical (symmetric) kernels. For illustrations in the present paper, we then 
focus on a bivariate case as beta kernel with correlation structure introduced by 
Sarmanov (1966); see also Lee (1996). A motivation to investigate the smoothing of 
these densities on [0,1] X [0,1] comes from a joint distribution of two comparable 
proportions. Many datasets in [0,1] X [0,1] can be found in statistical problems, for 
example, for comparing two rates. We shall examine the theoretical bias reduction 
and practical performances of the full bandwidth selection and two others band¬ 
width matrix parametrization using least squares or unbiased cross validation; 
see, e.g.. Wand and Jones (1993). 

The rest of the paper is organized as follows. Section 2 gives a complete 
definition of multivariate associated kernels which includes both the product and 
the classical symmetric ones. A method to construct any multivariate associated 
kernel from a parametric probability density function (pdf) is then provided. Some 
pointwise properties of the corresponding estimator are investigated, in particular 
the convergence in the sense of mean integrated squared error (MISE) and an 
algorithm of the bias reduction. Section 3 provides a particular study of a bivariate 
beta kernel with a correlation structure introduced by Sarmanov (1966). Also, some 
algorithms for the choice of the optimal bandwidth matrix by unbiased cross 
validation method are presented. This is followed, in Section 4, by simulation 
studies and a real data analysis of electoral behaviour of a population with regard 
to a candidate. Especially, the role of forms of bandwidth matrices is explored in 
details. Section 5 concludes with summary and final remarks, while the proof of 
a proposition is deferred to the appendix of Section 6. 

2. Multivariate associated kernel estimators 

Let Xi,..., X„ be independent and identically distributed (iid) random vectors 
with an unknown density function / on T \i, a subset of IR d (d > 1). As frequently 
observed in practice, the subset T rf might be unbounded, partially bounded or 
totally bounded as: 

T rf = R d “ x [z, oof x [u, wf w (2.1) 

for given reals u < w and z with nonnegative values of d m/ d z and d uw in {0,1,..., d] 

such that d = doo + d z + d uw . A multivariate associated kernel estimator f n of / is 
simply defined by 

— I n 

fn(x) = - Y, W VxeT rf c K d , (2.2) 

1 i =1 

where H is a d X d bandwidth matrix (i.e. symmetric and positive definite) such 
that H = H„ —> 0 ( f (the d x d null matrix) as n —» oo, and K X/ h(-) is the so-called 
associated kernel, parametrized by x and H, and precisely defined as follows. 
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Definition 2.1. Let T d (c R‘ f j be the support of the pdf to be estimated, xeT d a target 

vector and H a bandwidth matrix. A parametrized pdf K X/H (-) of support S X/ h (c R d j is 
called "multivariate (or general) associated kernel" if the following conditions are satisfied: 

x £ S X/ h, (2.3) 

E(Z x , H ) = x + a(x / H) / (2.4) 

Cov (Zx,h) = B(x, H), (2.5) 

where Z X/ h denotes the random vector with pdfK xii and both a(x, H) = (afx, H),..., a d (x, H)) T 
and B(x, H) = {hij(x, H)). , =i tend, respectively, to the null vector 0 and the null matrix 
0 d as H goes to 0 d . 

Remark 2.2. (i) The function K X/ h{-) is not necessary symmetric and is intrinsically 

linked to x and H. 

(ii) The support S X/ h is not necessary symmetric around ofx; it can depend or not on x 
and H. 

(iii) The condition (2.3) can be viewed as U xeTd S X/H □ and it implies that the associated 
kernel takes into account the support T d of the density f, to be estimated. 

(iv) If U xeTd S X/H does not contain T d then this is the well-known problem of boundary 
bias. 

(v) Both conditions (2.4) and (2.5) indicate that the associated kernel is more and more 
concentrated around ofx as H goes to 0 d . This highlights the peculiarity of associated 
kernel which can change its shape according to the target position. 

(vi) The form of orientation of the kernel is controlled by the parametrization of bandwidth 
matrix H; i.e. a full bandwidth matrix allows any orientation of the kernel and 
therefore any correlation structure. 

The following two examples provide well-known and also interesting partic¬ 
ular cases of multivariate associated kernel estimators. The first can be seen as an 
interpretation of associated kernels through symmetric kernels. The second deals 
on associated kernels without correlation structure. 

Example 2.3. (Classical kernels) The kernel estimator f, of the density f, appropriate for 
unbounded supports in particidar R rf , is usually defined by: 

— i " 

fn(x) = ~Yj Kh(x - XO, Vx G T d = R rf , (2.6) 
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where x is a target, H a bandwidth matrix and K H ( y) = (1/ det H)fC ( H 1 yj or sometimes 

k H (y) = (1/ det H ) lll K (H~ 1,2 yj/or all y G IR' 7 . The function K is the multivariate kernel 
assumed to be spherically symmetric and it does not depend on any parameter in particular 
x and H. The kernel function has also mean vector and covariance matrix respectively 
equal to zero and L; in general, the covariance matrix is the identity matrix: L = I. This 
function K is here called classical kernel. 

The following result connects a classical kernel to its corresponding symmetric 
or classical (multivariate) associated kernel. 

Proposition 2.4. Let IR d = T d be the support of the density to be estimated. Let K be 
a classical kernel with support S t / c IR' 7 , mean vector 0 and covariance matrix £. Given 
a target vector x G IR rf and a bandwidth matrix H, then the classical kernel induces the 
so-called classical (multivariate) associated kernel: (i) 

(2.7) 

on S X/H = x - HS,f with E (^ X/ h) = x d-e. a(x, H) = 0) and Cov (fZ, x , h) = HUH; (ii) 

on S X/ h = x - H 1/2 S ( ? with E (Zx,h) = x (he. a(x, H) = 0) and Cov (fZ.x, h) = H 1/2 £H 1/2 . 

Proof. We only proof (i) because (ii) is similar. From (2.2) and (2.6) with 
K h ( y) = (1/detH)K ^H _1 y), we easily deduce the expression (2.7). From (2.7) 
and Definition 2.1, for a fixed x G T rf = IR rf and for all t G Tj = IR' :7 , there exists 
u G Srf such that u = H _1 (x - t) and therefore t = x - Hu. This implies, from (2.3), 
that S X/ h = x - HSrf. The last two results are simply derived from calculating the 
covariance matrix and the mean vector of tZx,n (the random vector of pdf K X/ h) by 
making the previous substitution u = H (x - t).B 

It is known that the choice of classical kernels is not important; nevertheless, 
the best classical kernel is the Epanechnikov (1969) one in the sense of MISE with 
bounded support S L \. The most popular is the Gaussian kernel with Sj = IR‘ 7 = lb, 
L = I and therefore Cov (IZx, h) = H 2 ; see Chacon and Duong (2010) and Zougab 
et al. (2014). An interpretation of any classical multivariate associated kernel 
K X/ H (0 can be presented as follows: through the symmetry property of the classical 
kernel, the mean E (IZ x ,h) = x coincides with the mode which is the target x; 
separately and in contrario to the general case (2.5), the dispersion measure around 
of the target x, Cov (Zb /H ) = HEH which does not here depend on x, serves 
essentially to the smoothing parameters or to the bandwidth matrix. This is 
the basic concept of general associated kernels and it is a different approach with 


5 




respect to the convolution point of view. Note that the bandwidth matrix is similar 
to the dispersion matrix, which is symmetric and positive definite; see for instance 
Jorgensen (2013). For univariate dispersion parameter, we can refer to Jorgensen 
(1997) and Jorgensen and Kokonendji (2011) for different uses. 


Example 2.5. (Multiple kernels) The product kernel estimator introduced by Bouerz- 
marni and Rombouts (2010) can be defined as a product of univariate associated kernel 

estimators ofLibengue (2013). We here call it "multiple associated kernel estimator" f n of 
the density f: 

•&*> -Itn v *'- 6 ^= r ' <2 - 8) 

i= 1 7=1 


where is the support of univariate margin of f for j = 1 ,... ,d, x = (x,,..., x d ) T e 
Xy = iT^, X,- = (X a ,..., Xid) T for i = 1,... ,n, and hn, ■ • •, had are the univariate bandwidth 
parameters. The function K^ h is the jth univariate associated kernel on the support 
§ X j,hjj £ 1R- In principle, this estimator is more appropriate for bounded or partially bounded 
distributions without correlation in its components. A particular multiple associated kernel 
estimator is obtained by using univariate classical kernels. 


In the following proposition, we point out that all multiple associated kernels 
are multivariate associated kernels without correlation structure in the bandwidth 
matrix. 

Proposition 2.6. Let x^T 1 ^ 1 = T d be the support of the density f to be estimated with 
T^(c R) the supports of univariate margins of f. Let x = (x\,... ,x d ) T e and 

H = Diag(/zn,..., hdd) with hjj > 0. Let K^ h be a univariate associated kernel (see 

Definition 2.1 for d = 1) with its corresponding random variable -Zy /( on R) 

for all j = 1 ,... ,d. Then, the multiple associated kernel is also a multivariate associated 
kernel: 

d 

WO = []<%« < Z9 > 

7=1 

on S X/H = x^ =1 S Xj/hjj with IE (Z x , h) = (*i + «i(^i, ^n), ■■■,x d + a d (x d , h dd )) T and Cov (Z x , h) 
= Diag (bjfxj, hjf^ . In other words, the random variables Z^ h are independent 
components of the random vector Z x ,h- 

Proof. From (2.2) and (2.8), the expression (2.9) is easily deduced. The remain¬ 
der results are obtained directly by calculating the mean vector and covariance 
matrix of (Z^ hn , • • •, Z^ hu ) T = Z x ,h which is the random vector of the pdf (2.9). ■ 
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The multiple associated kernels have been illustrated in bivariate case by 
Bouerzmarni and Rombouts (2010). For simulation studies, the authors used two 
independent univariate beta kernels and also two independent univariate gamma 
kernels. It is easy to generalize the investigation from two to more independent 
univariate associated kernels. 

If we have an associated kernel, an estimator can be easily deduced as in 
(2.2). Otherwise, a construction of associated kernels is possible by using an 
appropriate pdf. The pdf used must have at least the same numbers of parameters 
than the number of components in the couple (x, H) as parameters of the expected 
associated kernel. The rest of this section is devoted to a construction of the 
multivariate associated kernels and then to some properties of the corresponding 
estimators. 

2.1. Construction of general associated kernels 

In order to build a multivariate associated kernel K X/ H (-)/ we have to evaluate 
the dimensions of x and H. We always have d components for the target vector 
x G T d which is completely separate from the bandwidth matrix H in the classical 
multivariate associated kernel; but, in general, x is intrinsically linked to H. 


H 

General 

Multiple 

Classical 

Full 

d(d + 3)/2 

2d 

d(d + 3)/2 

Scott 

d + 1 

d + 1 

d + 1 

Diagonal 

2d 

2d 

2d 


Table 2.1: Numbers kj of parameters according to the form of bandwidth matrices 
for general, multiple and classical asociated kernels. 


Table 2.1 gives the exact or minimal numbers kd (> d) of parameters in (x, H) 
according to different forms of H. The bandwidth matrix H = (hij)i,j=\,... : d is said 
full (i.e. with complete structure of correlations) and admits d(d + l)/2 parameters. 
It is said diagonal if H = Diag(/tn,..., hdf), i.e. without correlation, and has only 
d parameters. We denote by the Scott bandwidth matrix the form H = /zH 0 with 
only one parameter h > 0 and fixed H 0 = (hij,o)i,j=i,...,dl see Scott (1992, page 154). 
In practice, the matrix Ho can be fixed empirically from the data. Although k L i is 
the same for classical and general associated kernels in Table 2.1, the differences 
arise because of separation or not between x and H and, also, the presented k d are 
exact numbers for classical and minimal numbers for both general and multiple 
associated kernels. It becomes clear that any pdf, with at least k d parameters 
and having a unique mode and a dispersion matrix, can lead to an associated 
kernel. Now, we introduce the notion of type of kernel which is necessary for the 
construction from a given pdf. 
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Definition 2.7. A type of kernel K e is a parametrized pdf with support Sg c IR rf , 9 G 
© c such that Kg is squared integrable, unimodal with mode m G and admitting 
a d xd dispersion matrix D (which is symmetric and positive definite); i.e. 9 = (9(m, D) 
a k d x 1 vector with k d given in Table 2.1 and the d first coordinates of 9 corresponds to 
those of m. 

Let us denote by D Xo = ^(x - x 0 )(x - x 0 ) T K e (dx) the dispersion matrix of K e 
around the fixed vector x 0 . Here is a series of facts to have in mind. 

Lemma 2.8. Let K e be a type of kernel on § e c IR‘ f . The three following assertions are 
satisfied: 

(i) the mode vector m ofK e always belongs in § e ; 

(ii) Kg(m) > K 0 (p) where p is the mean vector of K e ; 

(iii) if D m tends to the null matrix, then also goes to the null matrix. 

Proof, (i) and (ii) are trivial. As for (iii), it is easy to check that D m = + 

(p - m)(p - m) T . Thus, D m tends to the null matrix means Kg goes to the Dirac 
probability in the sense of distribution; then, p - m goes to the null vector and 
therefore D f( also goes to the null matrix.* 

Without loss of generality, we only present a construction of general (or mul¬ 
tivariate) associated kernels excluding both multiple and classical ones. In fact, 
Libengue (2013) built some univariate associated kernels that can be used in multi¬ 
ple associated kernels. For this, he proposed a "mode-dispersion principle" saying 
that it must put the mode on the target and the dispersion parameter on the band¬ 
width. In the same spirit, we here propose a construction of general associated 
kernels using the multivariate mode-dispersion method given in (2.11) below. 

Indeed, since 9 = 9( m, D) is a k d x 1 vector, we must vectorize the couple (x, H) 
where x G is the target and H = (/7 ); ), ;=1/d is the bandwidth matrix. According 
to the symmetry of H, we use the so-called half vectorization of (x,H). That is 
defined as vech(x, H) = (x, vech H), where 

vech H = (h n , • • •, h ld/ h 2 2 ,..., h ld , ..., h(d-i)d/ h d d) (2.10) 

is a [d(d + l)/2] X1 vector obtained by stacking the columns of the lower triangular 
of H; see, e.g., Henderson and Searle (1979). Making general associated kernels 
from a type of kernel Kg on S g with 9 = 9(m, D) by multivariate mode-dispersion 
method requires solving the system of the equations 

(0(m, D)) T = (x, vech H) T . (2.11) 
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The solution of (2.11), if there exists, is a k d X 1 vector denoted by 6(x, H) := 
9 (m(x, H), D(x, H)). For d — 1, the system (2.11) provides the result in Libengue 
(2013). A light version will be presented for bivariate case (d = 2) in Section 3.2. 
For classical associated kernels, the system (2.11) means to solve directly (m, D) = 
(x,H). More generally, the solution of (2.11) depends on the flexibility of the type 
of kernel K e and leads to the corresponding associated kernel denoted K g ^ X/ H ). The 
constructed associated kernel satisfies Definition 2.1 of multivariate associated 
kernel: 

Proposition 2.9. The associated kernel function Ko^f), obtained from (2.11) and hav¬ 
ing support Se( X/ H)/ is such that: 


x e S 0 ( x ,h), 

(2.12) 

E [Ze(x, h)) - x = a 0 (x, H), 

(2.13) 

Cov(Ze(x,H)) = B 0 (x,H), 

(2.14) 


where Ze(x,H) is the random vector with pdfK Wx H) and both a 0 (x, H) = (a ei (x, H), ..., a ed (x, H)) T 
and B 0 (x, H) = ( ^ 0 , ; (x, FI)). . tend, respectively, to the null vector 0 and the null ma¬ 
trix 0 d as H goes to 0 d . 

Proof. The multivariate mode-dispersion method (2.11) implies @(x, H) = 

9 (m(x, H), D(x, H)) which leads to S 0 ( X/H ) := S 0 ( m ( X/H ) /D ( x ,H))- Since K e is unimodal of 
mode meSfl (see Part (i) of Lemma 2.8), we obviously have (2.12), and then (2.3) 
is checked, because m is identified to x in (2.11). Let Z,e(m, d) be the random vector 
with unimodal pdf as the type of kernel K 0 ( mD) . From Part (ii) of Lemma 2.8 we 
can write 

E (z,6( m,D)) = rn + r(m, D), 

where r(m, D) is the difference between the mean vector E (Zwm/D)) and the mode 
vector m of Ze(m,D)- Thus, from the mode-dispersion method (2.11), we have m = x 
and r(m, D) = r(m(x, H), D(x, H)); taking a 0 (x, H) = r(m(x, H), D(x, H)), Part (iii) 
of Lemma 2.8 leads to the second result (2.13), and therefore (2.4) is verified. Also, 
since fC 0 ( m/D ) admits a moment of second order, the covariance matrix of K 0(mD) 
exists and that can be written as Cov (Ze(m,D)) = B 0 (m, D); solving (2.11) and then 
taking B 0 (x, H) := B 0 (m(x, H), D(x, H)), the last result (2.14) holds in the sense of 
(2.5) using again Part (iii) of Lemma 2.8.B 

In practice, both characteristics a 0 (x,H) and B 0 (x, H) are derived from the 
calculation of the mean vector and covariance matrix of Ze(x, h) in terms of the 
mode vector m(x, H) and the dispersion matrix D(x, H). In a general way, a given 
fC X/ H or the constructed associated kernel fC 0 ( X/H ) in Proposition 2.9 (that we will call 
standard version) creates a quantity in the bias of the kernel density estimation. In 
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order to eliminate this quantity in the larger part of the support T d of the density 
to be estimated, we will also study a modified version of the standard one. The 
following two subsections investigate some properties of these estimators. 

2.2. Standard version of the estimator 

Here, we give some properties of the estimator f n of / through a given asso¬ 
ciated kernel presented in Definition 2.1 or the constructed associated kernel in 
Proposition 2.9; i.e. K 0 ( XrH )( - ) = X X/H (-). For a given bandwidth matrix H, similarly 
to (2.2) we consider 


f n (x)=-Y j K e(xM) (X i ), VxeTV 


(2.15) 


i=1 


Proposition 2.10. For given x e T d , 

E{^(x)} = E{/(Z e(X/ H))} (2.16) 

and f,(x) > 0. Furthermore, one has 


I f n (x)dx = A, 
Ji d 


(2.17) 


where the total mass A = A(n;H,K e ) is a positive real and, it is also a finite constant if 
f T/ K e(X/H) (t)dx < oo for all t 6 T d . 


Proof. The first result (2.16) is straightforwardly obtained from (2.15) as follows: 


e{/„(x)}= f 

ds e(x ,H)nT rf 


^0( X/ H mm =E(/(z e „, H) )). 


Also, the estimates f n (x ) > 0 and the total mass A > 0 stem immediately from the 
fact that fx 0 ( x ,H)(') is a pdf. Finally, the values of X, belonging to the set T d , we have 
(2.17) as 


f n (x)dx 

dT d 


1 

n 



Kd(x,H)(X-j)dx < oo. 


since the integration vector of f T K S ( X/H )(X,jdx is on the target x which is a parameter 

Of fC 0 ( X/ H)(-)-* 

From the above Proposition 2.10, the total mass A of f, by a non-classical as¬ 
sociated kernel generally fails to be equal to 1; an illustration for d = 2 is given 

in Table 3.2 below. Hence, non-classical associated kernel estimators f„ are im¬ 
proper density estimates or as kind of "balloon estimators"; see Sain (2002) and 
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Zougab et al. (2014). The fact that A = A(n;H,K g ) is close to 1 can find a sta¬ 
tistical explanation in both Examples 1 and 2 of Romano and Thombs (1996), or 
by showing E^J^ f n (x)dxj = 1 which does not depend on n: E { ^ f n (x)dxj = 

E {(/)(Xil Se(xH)nTrf ;H,R 0 )) with <p(t; H, K e ) = f T/ K e(X/H) (t)dx < oo for all t G T d . With¬ 
out loss of generality, we study x i-» /„(x) up to normalizing constant which is 
used at the end of the density estimation process. 

Proposition 2.11. Let x G T d be a target and a bandwidth matrix H = H„ —» 0 d as 
n —> oo. Assume f in the class 1K 2 (T d ), then 

Bias |/„(x)} = ag(x,H)V/(x) + ~ trace [ja e (x, H)aJ (x, H) + B 0 (x, H)| V 2 /(x)] 

+o {trace (h 2 )) . (2.18) 

Furthermore, if f is bounded on T d then there exists the largest positive real number 
r 2 = r 2 {K e ) such that ||Re( X/ H)|| 2 < C 2 (x)(detH) - '' 2 , 0 < C 2 (x) < oo and 

Var{/„(x)) = ^ ||K e(x ,H)|| 2 /(x) + o(n _1 (detH)“ r2 ), (2.19) 

with || X 0 ( x ,h)||o := f s K 2 (x H ^(u)du and where “<" standsfor "<and then approximation 

as n —> oo". 


Proof. See Appendix in Section 6. 

The univariate case (d = 1) of Proposition 2.11 can be found in Libengue (2013); 
see Chen (1999, 2000) for both beta and gamma kernels with r 2 - 1/2 and, also, 
Hirukawa and Sakudo (2014) for other values of r 2 . In the situation of multiple 
associated kernels (2.9), in contrario to (2.18) the general representation (2.19) is 
simply expressed in terms of univariate associated kernels as follows: 


Corollary 2.12. Under (2.9) with r 2r j = r 2 j and where = 
univariate type of kernel, then (2.19) is equivalent to 


Var j/„(x)} 


n il iM 


i =i 


/(x) + o 


n 


'U h 


7=1 


refers to the jth 


\ 

~ r 2.j 

77 


Proof. Easy.B 

Now, we recall natural measures for assessing the similarity of the multivariate 

associated kernel estimator f n according to the true pdf /, to be estimated. Since 
the pointwise measure is the mean squared error (MSE) and expressed by 

MSE (x) = Bias 2 {^(x)} + Var j^(x)}, (2.20) 
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the integrated form of MSE on T rf is MISE = J T MSE (x) dx and its approximate 
expression satisfies 


AMISEI 


(K) = J' | a e ( x / H ) v / ( x ) + \ trace {(ae(x, H)aJ(x, H) + B 0 (x, H)) V 2 / (x)j 


1 

+ - 

n 


Ik 


0(x,H) 


,f(x))dx. 


( 2 . 21 ) 


In the general case of associated kernels with correlation structure, an optimal 
bandwidth matrix that minimizes the AMISE (2.21) still remains challenging prob¬ 
lem, even in the bivariate case. However, the particular case of diagonal bandwidth 
matrix is H opt/ d iag = O (7r 2 A +4 >j for multiple gamma kernels; see Bouerzmarni and 
Rombouts (2010) for further details. So, the next result only gives the optimal 
bandwidth matrix for the Scott form which still has a correlation structure. 


In fact, let us consider H = /zH 0 be a Scott bandwidth matrix with fixed 
matrix H 0 and positive h = h n —> 0 as n —> oo. From Definition 2.1, Proposi¬ 
tion 2.4 and Proposition 2.9, it follows that there exists a d X 1 vector c*(x, H 0 ) = 
(c*(x, H 0 ),.. . ,c*(x, H 0 )) T and a d X d matrix C“(x,H 0 ) = (c**(x, H 0 )); /;= i. d of fi¬ 

nite constants connected respectively to ag(x, H) = (a 0 \ (x, H),..., a (k i(x, H)) - and 

B 0 (x, H) = (bgij(x, H)) i/j=1 . d such that for all i,j = 1,..., d, a e j(x, H) < hc*(x, H 0 ) and 

bgij{x, H) < h 2 c**.(x, H 0 ). Using Proposition 2.11 and assuming / such that its all 
first and second partial derivatives are bounded, one has: 

Bias |/i!(x)} < hs-i (x) and Var {/ n (x)} < n~ 1 h~ dr2 S 2 (x), 

where Si(x) > c* T (x, H 0 )V/ (x) + (/z/2) trace [{c’ T (x, H 0 )c*(x, H 0 ) + C*(x, H 0 )} V 2 / (x)] 
and S 2 (x) are positive scalars. From (2.20), it follows that MSE (x) < h 2 s 2 (x) + 
n~ 1 h~ dr2 S 2 (x). By integration of MSE (x), one obtains 

AMISEC/^h 0 ,w) ^ h2f i + n l h- dr2 t 2 , ( 2 . 22 ) 

with t\ and f 2 the anti-derivatives of respectively s 2 (x) and S 2 (x) on Tj. Taking the 
derivative of the second member of the inequality (2.22) equal to 0 leads to the 
following proposition. 


Proposition 2.13. Under assumption of Proposition 2.11, let H = /zH 0 be a Scott band¬ 
width matrix with fixed matrix H 0 and positive h = h n —>0asn^>oo and such that the 
right member of (2.22) satisfies 

0 < /z 2 h + < oo, (2.23) 

where r 2 = r 2 (K e ) given in (2.19). Then, the optimal bandwidth matrix H opt/Scott mini¬ 
mizing the AMISE in (2.21) is 

H opt ,scott = Cn-^^Ho, (2.24) 

where C is a positive constant. 
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Note that we can use the Scott bandwidth matrix H = HH 0 if (2.23) holds. However, 
in practice, we cannot check (2.23) because / is unknown. But, if the quantity 
h 2 ti + n~ l h~ dr2 t 2 of (2.23) becomes 0 (resp. oo) then one observes an undersmoothing 
(resp. over smoothing). Finally, the practical choice of Ho in (2.24) canbe the sample 
covariance matrix. 

Unlike to classical associated kernels for = IR rf (Example 2.3), the choice of 
non-classical associated kernels is very important for the support Tj (c IlUj of the 
pdf / to be estimated; see Parts (i)-(iv) of Remark 2.2. Also, from Parts (v)-(vi) of 
Remark 2.2, different positions of the target x£Tj and correlation structure need a 
suitable general associated kernel. Nevertheless, the selection of bandwidth matrix 
remains crucial when the general associated kernel is chosen; see, e.g. Chacon and 
Duong (2011) and Chacon et al. (2011). Here, we consider the multivariate least 
squares cross validation (LSCV) method to select the bandwidth matrix. From 
(2.15), the LSCV method is based on the minimization of the integrated squared 
error (ISE) which can be written as 

ISE(H) = | f n (x)dx — 2 f f n (x)f(x)dx + ( f 2 (x)dx. 

JT d J T rf J T d 

Minimizing this ISE(H) means to minimize the two first terms. However, we need 
to estimate the second term since it depends on the unknown pdf /. The LSCV 

estimator of ISE(H) - I f 2 (x)dx is 

Jl 1 ,; 

r — 2 2 " — 

LSCV(H) = j/„(x)} dx--YfUX t ), 

n “7 

where f n -i (X,) = (n - 1) _1 ^ ^e(x„H)(X ; ) is being computed as /„(X,) excluding 

i*i 

the observation X,. The bandwidth matrix obtained by the LSCV rule selection is 
defined as follows: 

H = arg min LSCV(H), (2.25) 

H e'H 

where 'H is the set of all positive definite full bandwidth matrices. The LSCV rule 
is the same for the Scott and diagonal bandwidth matrices where S and T) are 
their respective sets. The difficulty comes from the level of dimension of H which 
is, respectively, d(d +1)/2,1 and d for < H, S and T). Thus, for high dimension d > 2, 
the set of the Scott bandwidth matrices might be a good compromise between 
computational problems and correlation structures in the sample. 

2.3. Modified version of the estimator 

Following Chen (1999, 2000), Libengue (2013), Malec and Schienle (2014), 
Hirukawa and Sakudo (2014) and Igarashi and Kakizawa (2015) in univariate 
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case, a second version of the estimator (2.15) is sometimes necessary. Indeed, the 
presence of the non null term a g(\, H) with the gradient V/(x) in (2.18) increases 
the pointwise bias of /„ (x). Thus, we propose below an algorithm for eliminating 
the term of gradient in the largest region of T^. Since Bouerzmarni and Rombouts 
(2010) has shown the results for multiple associated kernels, we here investigate 
the case of general associated kernels with d > 1. 

The algorithm of bias reduction has two steps. The first step consists to de¬ 
fine both inside and boundary regions. The second one deals on the modified 
associated kernel which leads to the bias reduction in the interior domain. 

First step. Partitioning into two regions of order a(H) which is a d X1 vector, 
and where n(H) tends to the null vector 0 as H goes to the null matrix Of. 


a. interior region is the largest one inside the interior of T d in order to contain at 
least 95 percent of observations, and it is denoted by 


b. 


boundary regions representing the complementary of r T‘J (H) ' 1 in T d , and it is de¬ 
noted by which could be empty; recall that T d = TP f l(H) ' ; u and 


T «(H ),I n T «(H),B 
d d 


= 0 . 


Since T d (c might have each one of its d convex components as unbounded, 

partially bounded or totally bounded interval as in (2.1), there is only one 
but 

N d = l d °°2 d --3 d ‘™ - 1 (2.26) 

boundary subregions of TP^ h ^ b . In the below Section 3.3.2 an illustration is pro¬ 
vided for d = 2 with T 2 = [0,1] X [0,1] and, therefore, the number of boundary 
subregions (2.26) is N 2 = 3 2 - 1 = 8. 

Second step. Changing the general associated kernel fC 0 ( X/H ) into its modified 
version Ry x H) ; that leads to replace the couple (a e (x, H), B 0 (x, H)) into (ay(x, H), Bg(x, H)) 
with a ( j(x, H) = a f y (x, H) L t ,«),b(x) because a^(x, H)l Ta ( H y(x) = 0 in the interior, and 
Bg(x, H) = By(x, H)1 . t i(H),j(x) + By(x,H)l Tl (H),/,'(x). This modified associated kernel 
is such that, for any fixed bandwidth matrix H, 


8(x,H) = 


0j(x,H): ay(x,H) = 0 ifxGT“ (H)J 
0 B (x, H) if x g tt" (h),b 


must be continuous on T d and constant on T'j (H) ' B . 


(2.27) 


Proposition 2.14. The function Ky x H) on its support Sy x Hj = Se( X/ H) and obtained from 
(2.27) is also a general associated kernel. 
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Proof. Since K Q ( X/ h) is a general associated kernel for all x e U 

one gets the first condition of Definition 2.1 because Sg( X/ H) = S f j (x H) . According to 
Proposition 2.9 it follows that, for a given random variable H) with pdf Kg, H) , 
we obtain the last two conditions of Definition 2.1 as 

E fe(x,H)) = x + a e( x ' H) and Cov (^e ( x,H)) = B e( x ' H )- 

Both quantities agfx, H) and Bg(x, H) tend, respectively, to 0 and 0,; as H goes to Od¬ 
in particular, from (2.27) we have a^(x, H) = ().■ 

Similar to both (2.2) and (2.15), the modified associated kernel estimator f, using 
Kg {x H) is then defined by 

& x > = iEw*>. (2 - 28) 

i =1 

The following result gives only in the interior TT^ (H) ' 7 of T,f the pointwise expressions 
of the bias and the variance of /„. Of course, the corresponding expressions in the 
boundary regions T‘ f l(H)/B are tedious to write with respect to the Nj (2.26) boundary 
situations from (2.27). 


Proposition 2.15. Let f n and f n be the multivariate associated kernel estimators of f 
defined in (2.15) and (2.28) respectively. Then, for x e TTJ (H) ' 7 as j n (2.27): 

Bias j/„(x)| = ^ trace (b^(x, H)V 2 / (x)) + o (trace (h 2 )) (2.29) 


and 


Var{/„(x)j ^ Varj/„(x)) 


as n 


OO. 


Proof. We have the first result (2.29) by replacing in (2.18) /„, ag and Bg by f„, 
Ag and Bg, respectively. For the last result, considering (2.19) it is sufficient to 
show that 


Ik 


0(x,H) 


K 


0(x,H) 


as H —> 0,i. 


Since K d ( X/ H ) and Kg, H) are general associated kernels of the same type K e with 
r 2 = r 2 (K 0 ) and r 2 = r 2 (K-~), then there exists a common largest positive real number 


r* = r* 2 (K e ) such that 


||^ 0 (x / h )||2 ^ c 2 (x)(detH) r 2 and 


K— 

0(x,H) 


2 

2 


< c 2 (x)(det H ) _r 2 


with 0 < c 2 (x),c 2 (x) < oo. 
c(x)(det H )~ lr 2 and |^ 0 ( X/H ) 


then 


K- 


0(x,H) 


||^0(x,H) 


Taking c(x) = sup (c 2 (x),c 2 (x)}, we have iKg^H)^ ^ 

< c(x)(detH)~ 2 f 2 . Since c(x)/n(detH) 2r z = o(n _ 1 (detH) _ 2 , 2 ) 
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Thus, we define the asymptotic expression of the MISE of f„ in the interior 
TE! (H ^ as follows: 

a 

AMISE 5i (£) = £ mi ({i trace (B^x, H)V 2 / (x))}‘ + \ ||K efcH) |g/(x)j dx. 

All the results in this section can be easily deduced for both cases of diagonal and 
Scott bandwidth matrices. The following section provides some detailed results for 
d = 2 with a bivariate beta kernel having correlation structure on T 2 = [0,1] X [0,1]. 
The corresponding associated kernel is built from a technique due to Sarmanov 
(1966) and using two independent univariate beta pdfs. See Balakrishnan and Lai 
(2009), Kundu et al. (2010) for some other examples of bivariate types of kernels 
and Kotz et al. (2000) in multivariate case. 


3. Bivariate beta kernel with correlation structure 


This section presents the generalizable procedure of a bivariate beta kernel 
estimator from a bivariate beta pdf with correlation structure to its corresponding 
associated kernel. The standard associated kernel is built by a variant of the 
mode-dispersion method deduced from (2.11). Then, we provide properties of 
both versions of the corresponding estimators. 


3.1. Type of bivariate beta kernel 

In order to control better the effects of correlation, we here consider a flexible 
type of bivariate beta kernel for which the correlation structure is introduced by 
Sarmanov (1966); see also Lee (1996). Let us take two independent univariate beta 
distributions with pdfs 


gft) 


t p r\l - j = 1,2, 


1 


@(PpHj) 


(3.1) 


where £%(pj,qj) = £' t p > 1 (1 - t)T 1 dt is the usual beta function with pj > 0 and 
cjj > 0. Their means and variances are, respectively. 


Vi 

Ti = ~h- = TjiPnT) and °) = 
Pi + Hi 


PiHi 


(i Pi + Hj) 2 (Pi + Hi + 1) 


= °)(.Vi'Hj)- (3-2) 


Also, gj are unimodal for fj > 1, qj > 1 and ( Pj,qj ) + (1/1)/ with mode and 
dispersion parameters: 


m j(Pj'Hj) 


Pj- 1 
Pi + Hi ~ 2 


and d 


1 


Pi + Hi~ 2 


djiPjr Hj)‘ 


(3.3) 
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The corresponding pdf (or type of kernel) of the bivariate beta-Sarmanov with 
correlation and from gj of (3.1) is then denoted by g e (= BSg ) and defined as: 


ge(y) = gi(vi)g 2 (v 2 ) 


V 1 - fafa, (fr) V 2 ~ p 2 (p 2/ C\l) 

1 + px--— / X 


oi(pi,qi) 


02(V2,qi) 


L [ 0 ,l]x[ 0 , 


,i](v), (3.4) 


with v = {px,v 2 ) T and 6 := 9(px,qx r p 2 ,q2, p) e © c 1R 5 . Depending on pj and qj, the 
correlation parameter p = p(px,qx,P 2 ,q 2 ) belongs to the following interval 


[-£,£'] c [-1,1], 


(3.5) 


with nonnegative reals 


£ = 


max 

\ V\,V2 


pi - pi(pi,qi) 
Oi(jpi,qi) 


x 


V2~ P2(p2,q2) \\ 1 

o 2 (p2,q2) j) 


and 


mm 


f Pi ~ pi(pi,qi) v 2 - P2(p2,q2) ' 


, -l 


yi'°2 [ 0i(pi,qi) o 2 (p2,q2) 

Thus, the mean vector and covariance matrix of gg are, respectively. 


p = (pi, P 2 V and L = 


a\ o x o2p 

OxO2p o\ 


The unimodality of gg in (3.4) also occurs for pj > 1, qj > 1 and (pj, qj) y (1,1) with 
j = 1,2. However, the corresponding mode vector m e = m(p\,p 2 ,qx,q 2 ,p) -■ nip 
of gg does not have an explicit expression; but, we numerically verified that this 
mode vector m p is slightly shifted with respect to the modal vector (3.3) of the two 
independent margins that we denote by m 0 = {nh{px, qx),m 2 {p 2 , q 2 )) T for p = 0. 

Figure 3.1 illustrates some different effects of both null and positive correlations 
(3.5) on the unimodality of (3.4). The negative correlations will show the opposite 
effects in terms of positions according to the modal vector m 0 for p = 0. In other 
words, the correlation parameter p enables the pdf gg to reach points which are 
inaccessible with the null correlation. The parameter values of both univariate 
beta (3.1) used for Figure 3.1 produce the following intervals (3.5) of correlation: 


• p G [-0.100,0.210] if pi = p 2 = l,^i = q 2 = 8/3 for an angle; 

• p e [-0.120,0.143] if pi = 5/3, p 2 - l,qx - 2 ,q 2 = 8/3 for an edge; 

• p e [-0.008,0.214] if p x = 149/60, p 2 = 151/60, q x = 71/60,^ = 23/20 for the 
interior. 
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max g: g(0,0) = 7.11 


max g: g(0,0) = 9.35 



(aO): p = 0 (al): p = 0.18 


max g: g(0.4,0) = 3.86 


max g: g(0.32,0) = 4.11 



0.0 0.0 


0.0 0.0 


(bO): p = 0 


(bl): p = 0.12 


max g: g(0.89,0.91) = 3.58 max g: g(0.92,0.94) = 4.39 



0.0 °-° 0.0 0 0 


(cO): p = 0 


(cl): p = 0.19 


Figure 3.1: Some shapes of the bivariate beta-Sarmanov type (3.4) with different 
effects of correlations on the unimodality ((a): pi = pi = l,qi = cj 2 = 8/3; (b): p\ = 
5/3 ,p 2 = l,q! = 2,ci2 = 8/3; (c): p x = 149/60, p 2 = 151/60,^ = 71/60, q 2 = 23/20). 
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Concerning the dispersion matrix D p of the bivariate beta-Sarmanov type we 
consider 


( d x (d!d 2 ) 1/2 p\ 
p \(did 2 ) 1/2 p d 2 )’ 


(3-6) 


where d\ and d 2 are the dispersion parameters (3.3) of margins and p the correlation 
parameter. This dispersion matrix is the analogue of the covariance one. Since we 
do not have a closed expression of the modal vector m p , we cannot use the bivariate 
mode-dispersion method (2.11) for a construction of the bivariate beta-Sarmanov 
kernel. 


3.2. Bivariate beta-Sarmanov kernel 


From the previous section, the standard version of the bivariate beta-Sarmanov 
kernel is here constructed by using the modal vector m 0 = (mi(pi,qi), m 2 (p 2 ,q 2 )) T 
of p = 0 instead of m p as a variant of the mode-dispersion method (2.11). This 
choice will be compensated in the bandwidth matrix H connected to the complete 
dispersion matrix D p with correlation structure (3.6). 

Indeed, solving (@(m 0 , D p )) = (x, vech H) T in the sense of m 0 = x and D p = H 
leads to the new reparametrization of g e of (3.4) from 6 = d{pi,qi,p 2 ,q 2 , p) c ]R 5 
into 

(h i h 12 
\hi 2 h 22 
(3.7) 

Rewriting (3.2) in terms of (3.7), the means pj(pj,qj ) and variances o 2 (pj,qj) of the 
univariate beta pdfs become 
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a, Tj\ _/ Xi 1 - X! x 2 l-x 2 h 

6{x, H) - -— + 1, — - + 1/7— + 1' “1 - + 1/ 77— 7 \i i 2 

\hn hn h 22 h 22 (^ 11 ^ 22 )' t 


Vx = 


/H 



Xj + hjj 

1 + 2 hjj 


Pj(Xj, hjj) and a 2 


(xj + hjj)( 1 + hjj - xj) 

(1 + 2hjjf{l + 3 hjj) 



Finally, the bivariate beta-Sarmanov kernel is defined as BSg( Xr h) := ge( X/ H) such that 


BS0( X/ H)(r»i,r»2) 


v* llhl (l - Ui) (1 


-xp/hu 


\ ( 


SSty 1 + Xi/h n , 1 + (1 — Xi)/h-n) 


v x 2 2 ' 122 (1 - v 2 f 


-xplh-ri 


SS(p\ + x 2 /h 22 ,1 + (1 — x 2 )/h 22 ) 


x 


vi ~ni(xi,h u ) v 2 - p 2 {x 2 ,h 22 ) 

1 + h 2 x ——-x ——3 - 

h'oi{xi,hi\) h' o 2 (x 2 ,h 22 ) 


l l01] 2 (v lr v 2 ) r (3.8) 


with the constraints 


hi 2 £ [—^, ft] n y/hnh 22 , y/^11^22) , 


(3.9) 


( 


P = 


max 

Vl,v 2 

V 


Vi - pi(xi,hu) v 2 - p 2 (x 2 , h 22 ) 
h\^ 0 i(xi,hn) h 2 2 o 2 (x 2 ,h 22 ) 
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and 


( 


P = 


min 

Vl,v 2 

V 


V\ 

h^OripcxMi) 


V 2 ~ t-l2(X2,h 22 ) 

h\ 2 <J 2 (x 2 ,h 22 ) 



The first interval [-($,($'] of (3.9) is the equivalent [-£,£'] in (3.5) and the sec¬ 
ond one (- ^Jhnh 22 , ^Jh n h 22 ) of (3.9) is due to the constraints from the bandwidth 
matrix H, which is symmetric and positive definite. In practice, one often has: 
[~P'P'] c (- V^n ^22 / ^Jh n h 22 ). Of course, the beta-Sarmanov kernel BS 0 ( X/H ) satis¬ 
fies Definition 2.1 of associated kernel: 


T (1 - 2 xdhu 

S 0 ( X/ H) = [0,1] X [0,1], a 0 (x,H) = (a ei ,a e2 ) with a ej = 1 + 2h = a ej (xj r hjj) r 


n 


B e (x,H) = (b gi j)i,j = 1,2 with b gjj = o 1 j {x j ,h jj ) and b ei2 


h\i 


x/hnhn 


oi{xi,h n )o 2 (x 2 ,h 22 ). (3.10) 


Table 3.1 shows some effects of the correlation parameter /t 12 in BS 0 ( X/H ) on the 
modal vector and maximum values, which are obtained by using corresponding 
values of (pi, q\, p 2 , q 2/ p) for Figure 3.1. 


Position 

X 

Interval of h\ 2 

H 

Maximum value of BSg 

Angle 

(0, 0) 

[-0.040, 0.128] 

Diag (0.600,0.600) 
/0.600 0.128\ 
\0.128 0.600j 

BS 0(X/ h)(O,O) = 7.11 

BS 0(x , H) (O,O) = 9.77 

Edge 

(0.40, 0.00) 

[-0.050, 0.104] 

Diag (0.600,0.600) 
/0.600 0.104\ 
\0.104 0.600j 

BS 0(x<H) (O.4O, 0.00) = 3.86 

BS 0(x ,h)(O.32, 0.00) = 4.11 

Interior 

(0.89, 0.91) 

[-0.060, 0.128] 

Diag (0.612,0.600) 

/0.612 0.123\ 
(o,123 0.600j 

BS 0(x , h) (O.89,O.91) = 3.58 

BS 0(x<H) (O.92, 0.94) = 4.46 


Table 3.1: The corresponding values of Figure 3.1 for the bivariate beta-Sarmanov 
kernel (3.8). 


3.3. Bivariate beta-Sarmanov kernel estimators 
3.3.1. Standard version of the estimator 

In this particular case, the beta-Sarmanov kernel estimator 

— i " 

fn(x) =~Yj BS W x <), Vx g [0,1] X [0,1] (3.11) 
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A(n,H, BS 0 ) 

sample 1 

sample 2 

sample 3 

sample 4 

/Zl2 

=-0.0003 

1.034019 

0.9954256 

1.002369 

1.025671 

/Zl2 

= 0 

1.034078 

0.9955653 

1.002418 

1.025731 

h 2 

= 0.0004 

1.034157 

0.9957517 

1.002483 

1.025811 


Table 3.2: Some values of A(n; H, BSg ) for hu = 0.10,/z 22 = 0.07 and n = 1000. 


also satisfies Proposition 2.10. Table 3.2 allows to observe the effect of correlation 
(3.9) on the total mass A(n;H, BSg) ^ 1 by using four samples of simulated data. 
Fixing x = (xi,x 2 ) t in [0,1] X [0,1], the pointwise bias is written as 

df 


(— | df df 1 

Bias |/h(x)| = a e 1 —(x) + ci ei — (x) + 


dx\ 


'dx 2 


? d 2 / d 2 f 

(a m + bgn) — (x) + 2(z7 0 ifl 02 + bgu)^, - X —(x) 


dxidxi 


dx\ c)x 2 


d 2 f ) 

+ (a 2 e2 + bg 2 l) (x) > + o(/7^ + lh\ 2 + /z| 2 ); 


and, the pointwise variance is 


with 
||BS e(X/ H)| 


Var j/„(x)j = - ||BS 0(x , H )|| 2 /(x) + o(n 1 (detH) r2 ), 

SS (1 + 2xi//7n, 1 + (1 — Xi)lhn) Sd (1 + 2x 2 //z 22 , 1 + (1 — x 2 )//z 22 ) 
{£$ (1 + xi/hn, 1 + (1 - Xi)/hn) Sd (1 + X 2 /J 122 ,1 + (1 - x 2 )/fz 22 )} 2 


x U + 


(2xi - 1)(2 x 2 - 1) / ( x 1 + hu) X (1 + 3/7h)(x 2 + /z 22 ) X (1 + 3/z 22 ) 


1/2 


2(1 + h n )(l + h 22 ) \ 


(1 — X\ + /zn)(l — x 2 + lz 22 ) 


02 


+ 


X 


2xi + h 


11 


+ 


2(xi + h n )(l + ^ 11 ) \ / 2x 2 + /z 22 2 (x 2 + lz 22 )( 1 + /z 22 ) 


+ 


Ml + 2M- 1 /z 2 (2 + 3/zn)- 1 ) \h n (l + 2M' 1 /z 2 (2 + 3/z 22 ) 


-1 


(1 + ^ 11 ) X (1 + 3/zn)(l + lz 22 ) J (1 + 3 /z 22 ) 


-lz 


^4(1 — Xj + hn)(2. + 31zii)(l — x 2 + fz 22 )( 2 + 31z 22 ) 
Using (3.10) the AMISE (2.21) becomes here 


AMISE(D = f ( 

J[0,1]X[0,1] \ 




df 


1 


IM 




^01 ^—( x ) + zz 0 2 ^;—(x) + - ( {a ei + bgn)—j(x) 


dx\ 


' dx 


dx 2 


d 2 f 7 d 2 f 

+ 2 (agiClg 2 + bgi 2 )^. - y [—(x) + (fl 02 + bg 2 l)^T^(x) 

0 X 10 X 2 ox ^ 


t "I 2 y 

| + “ \\ BS eA 


0(x,H) L/(x) 


dx. 


The bandwidth matrix is selected by the LSCV method (2.25) on the set D of 
diagonal bandwidth matrices. Concerning full and Scott cases, this LSCV method 
is used under Vti and <Si, respectively, subsets of < H and S verifying the constraint 
of the beta-Sarmanov kernel (3.9). Their algorithms are described below and used 
for numerical studies in Section 4. 
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Algorithms ofLSCV method (2.25) for three forms ofbandwidth matrices in two dimensions 
(d = 2) 

Al. Full bandwidth matrices. 

1. Choose two intervals H n and H 22 related to hn and h 2 2 , respectively. 

2. For b = 1,..., and y = b(H 22 ), 

(a) Compute the interval H 12 [b, y] related to h 12 from constraints (3.9); 

(b) VoxA = l,...,£(H 12 [b,y}), 

Compose the full bandwidth matrix H(<5, y, A) := (lufb, y, A)). with 

h n (b,y,A) = H n (b), h 22 (b,y,A) = H 22 (y ) and h 12 (b,y,A) = H 12 [b,y](A). 

3. Apply LSCV method on the set'Hi of all full bandwidth matrices A). 

A2. Scott bandwidth matrices. 

1. Choose an interval H related to h and a fixed bandwidth matrix Ho = 

( hoi ’)i,j= 1,2 

2. For C = 1/ • • . 

(a) Compute the interval H 012 [Q related to h 012 from constraints (3.9); 

(b) For k = 1,, b(H 0 n[Q), 

Compose the given bandwidth matrix H 0 (G x) := (lioijiC, x)). with 

WC,K) - /toil/^ 022 (C/K) = h 022 and h^ 2 fC, k) = H 0 i 2 [C](x); 

(c) Compose then the Scott bandwidth matrix H(C, x) := H(Q X H 0 (G x). 

3. Apply LSCV method on the set *Si of all Scott bandwidth matrices H(C, x). 

A3. Diagonal bandwidth matrices. 

1. Choose two intervals H n and H 22 related to hn and h 22 , respectively. 

2. For b = 1,..., i(H n ) and y = 1,..., £(H 22 ), 

Compose the diagonal bandwidth matrix H(<5,y) := Diag (H n ( 6 ),H 22 (y)). 

3. Apply LSCV method on the set D of all diagonal bandwidth matrices 
H( 6 ,y). 

Let us conclude these algorithms by the following precisions. For a given interval 
l, the notation C(l) is the total number of subdivisions of I and /(?/) denotes the 
real value at the subdivision r\ of I. Also, for practical uses of (Al) and (A3), both 
intervals H n and H 22 are generally chosen to be (0,1). In the case of the Scott 
bandwidth matrix (A2), we retain the interval H = (0,2) and the fixed bandwidth 
matrix H 0 = £, where L is the sample covariance matrix. See Figure 4.2 for 
graphical illustrations. 
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3.3.2. Modified version of the estimator 

Being large in the standard version, the pointwise bias of the beta-Sarmanov 
kernel estimator (3.11) must be reduced. Following the algorithm of Section 2.3 
and without numerical illustration in this paper, the first step divides [0,1] X [0,1] 
in nine subregions of order «(H) = (ociQiu), a 2 (h 22 )) T with afhjj) > 0 for j = 1,2: 

a. only one interior subregion denoted as = («i(/zn), 1 -ai(hn)) x {a 2 (h 22 ), 1 - 

otiihii))', 

b. eight boundary subregions divided in two parts as 

(i) four angle subregions denoted by 

T“ (H),A = [0,«i (h n )\ x [0,a 2 (h 2 2)\ U [1 - «i(/zn), 1] x [0 r a 2 (h 22 )] 

U [0,ui(/in)] x [1 - a 2 (h 22 ),l\ U [1 - a 2 (h 22 ),l\ x [1 — « 2 (^ 22 )/lL 

(ii) four edge subregions denoted by 

T" (h)/E = («i(/zn), 1 - afhu)) x [1 - a 2 (h 22 ), 1] U [0, a^hu)] x (a 2 {h 22 ), 1 - a 2 (h 22 )) 
U (ai(h n ), 1 - «i(M) x [0,« 2 (^ 22 >] 

U [1 - ai(/zn), 1] x (a 2 (h 22 ), 1 - a 2 (h 22 )). 

As for the second step, we consider the three functions ip 2 : [0,1] —> [0,1] 
and : < H —» 1R such that 

i l>j(z } ) = aj{hjj){Zj - afhjj) + 1}, Vz ; - € [0,1], ; = 1,2 and t/> 3 (H) = ~^==- (3.12) 

yhnh 22 

Each axis of [0,1] x [0,1] has one interior subregion (n ; (/i ;; ), 1 - afhjj)) and two 
boundary regions [0 ,ctj(hjj)\ and [1 - a\(hjj),l] for j = 1,2. Thus, from (2.27) with 
d — 1, one gets the new parametrization of each margin beta kernel used in BSg ^ x/H ) 
of (3.8) with x = {xi,x 2 ) t \ for j = 1,2, 


if*,- €[o, «,(/.„)], 

(fc/,)), 



(3.13) 
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Therefore, using i p 3 of (3.12) and by combination of (3.13) for each of the nine 
subregions of [0,1] x [0,1], then 9 is expressed by 


0(x,H) = 



V'lhl) Xj_ Iplfa) X2_ _Jh 2_) T 
hn ' hi' hn ' h 22 r (/111/122) 1 / 2 / 

l/'lhl) Xj_ 1-X 2 ]hO~X 2) _^12_\ T 

' /!„' h 22 ' h 22 ' (h n h 22 ) 1/2 ) 

1-*1 I^ih-n) ^(xz) X2_ h 12 \ T 

hi ' h n ' h 22 ' h 22 r (h n h 22 fi 2 ) 

l-x 1 ^i(l-n) l-x 2 ip 2 (l-x 2 ) /; 12 

hi ' hi ' /*22 ' /l 22 ' (hlhg ) 1 / 2 ) 

*1 1 —*1 *2 l-x 2 __k 12_\ _ a / v TT\ 

/in' /ill ' fe' /I22 ' (/lll/l 22 ) 1/2 / ' ' 

l/'lhl) £l_ X2_ 1—X 2 /ll2 ) T 

/ill 7 /ill 7 /l22 7 /l22 ' (/lufe) 1 / 2 / 

Xi l-xi l/’2(X2) 1-X2 /112 \ T 

/Ill' /Hi ' /122 ' /122 ' (/lll/l22) 1/2 / 

Xl_ l~Xi 1-X 2 lj' 2 (l-X 2 ) /? 12 \ T 

/Ill' /ill ' /I22 ' /122 ' (/lll/l22) 1/2 / 

1—x t iMl-X!) x 2 1—x 2 /ii 2 ) T 

/ill ' /ill ' /122' /122 ' (huh 22 ) 112 ) 


if X G [0,Ui(/7n)] X [0,U 2 (?/ 22 )] 

if X g [0,«i(i/n)] x [1 - a 2 (h 22 ), 1] 

if x G [1 - a^/iu), 1] x [0,n 2 (/hi)] 

if x g [1 - a\{hn), 1] x [1 - a 2 (h 22 ), 1] 

if x g (ui(/7n), 1 - «i(/in)) x (a 2 (h 22 ), 1 - a 2 (h 22 )) 

if x g [0,ui(/7 n )] x (a 2 (h 22 ), 1 - a 2 (h 22 )) 

if x g («i(/7n), 1 - ai(h n )) x [0,« 2 (/z 22 )] 

if x g {ai{}i n ), 1 - ai(hn)) x [1 - a 2 (h 22 ), 1] 

if x g [1 - «i(/7n), 1] x (a 2 (h 22 ), 1 - a 2 {h 22 )). 

(3.14) 


Notice that the last component hi 2 (hnh 22 )~ L/2 of 9 does not change in all subregions 
of the support. This is because of the choice of m 0 related to the construction of 
the beta-Sarmanov kernel (3.8). According to Proposition 2.14 the modified beta- 
Sarmanov kernel obtained from (3.14) and denoted by £>Sg (xI1) is an associated 
kernel with Sg (x H) = [0,1] x [0,1], 


ag(x, H) 


/(l—Xl)l/*l(Xl)+X 2 (1—x 2 )l/* 2 (x 2 )+x 2 \ T 

\ Xj+1 p^X-i) ' x 2 +lp 2 (x 2 ) J 

/ (l-xOi/'ifxO+x 2 \ T 
[ x 1+ iMxi) ,u ; 

/ (l-Xi)l/>i(X!)+X 2 (l-x 2 )(x 2 -l/) 2 (l-x 2 )| \ T 

( Xi+t/i^Xi) ' l-X 2 +l/’ 2 (l-X 2 ) ) 

/ (1-X 2 )l/> 2 (X 2 )+X 2 \ T 

1 ' X 2 + l p 2 (x 2 ) ) 

(0,0) T 

(n (l-x 2 )|x 2 -l/) 2 (l-x 2 )| \ T 
\ ' l-X 2 +l/' 2 (1-X 2 ) / 

/ (1—X 1 )jx 1 —X!)} (l-x 2 )ljl 2 (x 2 )+x 2 \ T 
( l-Xl+l/'l(l-Xl) ' x 2 +\p 2 (x 2 ) J 

( (1—X t ){Xl~ 1^1 (1—X~[)} p.\ T 

V 1-x 1 +i[j 1 (1-x 1 ) ,U J 

( (l-XpjX!-^^!—Xi)} (1— X 2 )(x 2 — l/’ 2 (l—x 2 )| \ T 
. V l-X 1 +lfj 1 {l-X 1 ) ’ l-x 2 +ip 2 (l-x 2 ) / 


if X G [O,^!^!!)] X [0,U 2 (/7 22 )] 

if X g [0, x (a 2 {h 22 ), 1 - a 2 {h 22 )) 

if x g [0,ni(/7n)] x [1 - a 2 (h 22 ), 1] 

if X g (ai(hn), 1 - oti{h n )) x [0,« 2 (lz 22 )] 

if X G («i(fzn), 1 - Ui(/7n)) x (a 2 (h 22 ), 1 - a 2 (h 22 )) 

if X g (ai(/7n), 1 - otiQin)) x [1 - a 2 (h 22 ), 1] 

if X g [1 - ai(hn), 1] x [0, a 2 (h n )\ 

if x g [1 - oi\(h\\), 1] x ( a 2 (h 22 ), 1 - a 2 (h 22 )) 

if x G [1 - «i(/7n), 1] x [1 - a 2 (h 22 ), 1] 

(3.15) 
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and, Bg(x, H) = (fc^) ; , /= i, 2 such that fo rTl2 = {h 12 /(h n h 22 ) 1/2 ^jbg n bg 22 and (b§ n ,bfa} is 
detailed as 


/ h n Xllpl(x 1 ){x 1 +ll) 1 (x 1 )} 2 h 22 x 2 4’ 2 (x 2 ){x 2 +lp 2 (x 2 )l 2 \ 

\ fe+l/>l(Xl)+/ln} ' { X 2 +1p2(x 2 )+h22} ) 

( frii*iih(*i)l*i+A(*i))~ 2 h 22 x2(i-x 2 ) \ 

\ fe+l/>l(Xl)+/lll} ' 1+//22 / 

/ ^inn/ ; i(n)l-n+i/ , i(^i))~ 2 ^ 22 ( 1 —^ 2 ) 11 —-^'2+<^2(i—^'2)>~ 2 \ 

{Xi+lpi(Xi)+hn) ' [ll> 2 (l-X 2 )}- 1 {l-X 2 +ll> 2 (l-X 2 )+h 2 2 } J 

ftngiCj-gi) h22x 2 ip 2 (x 2 ){x 2 +4' 2 (x 2 )}- 2 \ 

1+^11 ' (x 2 +l/’2(X2)+/!22l / 

/i22-t2(l--T 2 ) \ 
l+Z/ii / I+/Z 22 / 

//llJCl ( 1 — _ri) /i22(l-*2)(l-*2+ 1 / ; 2(l--' t: 2)r^ ) 

l+h n ' {l/’ 2 (l—^2)}“ 1 {1—^2+j/’2(l—-^2)+/Z22} ) 
ftn(l-*i)|l-*i+ih(l-*i)r 2 h 22 x 2 4< 2 (x 2 ){x 2 +il< 2 (x 2 )}- 2 \ 

{tM 1- *l)} _1 (l— Xi+ipi(l-x 1 )+h n )' {x 2 +4’ 2 (x 2 )+h 22 } J 

^n(i-n)li~n+ij ; i(i-n)r 2 h 22 x 2 (i-x 2 ) \ 

l4 ) -i(l-x\)}~ 1 {l-xi+4>i(l-xi)+hii}' l+h 22 J 

/in(i-^i)|i--n+i/'i(i-n)r 2 /;22(i-x 2 )|i-x 2 +i/>2(i-^2)r 2 

(l/'l(l-^l)l _1 (l-n+l/'l(l-^l)+^lir |l/’ 2 (l-^ 2 )l _1 (l-^ 2 +l/' 2 (l-i' 2 )+/j 22 l 


if x G [0, ai(hn)] X [0, n 2 (lz 22 )] 

if x G [0, a\(hn)] x ( a 2 (h 22 ), 1 - a 2 (h 22 )) 

if x G [0,«i(/z n )] x [1 - a 2 {h 22 ), 1] 

if x G (ni(/zn), 1 - a\(hn)) x [0,a 2 (h 22 )] 
if x G T 2 mi 

if x g («i(/7n), 1 - a-iQin)) x [1 - a 2 {h 22 ), 1] 
if x g [1 - ni(izn), 1] x [0, a 2 {h n )] 
if x g [1 - ai(/zn), 1] x (a 2 (h 22 ), 1 - a 2 (h 22 )) 
j if x G [1 - ni(izn), 1] x [1 - a 2 {h 22 ), 1], 


The corresponding modified beta-Sarmanov kernel estimator 



BS 6(x,H >( x 0, Vxg[0,1]x[0,1] 


(3.16) 


has, for all x G = (ai(h n ), 1 - ci\{h\\)) x (a 2 (lz 22 ), 1 - cc 2 (h 22 )), 

, 1 { d 2 f d 2 f d 2 f ) 

Bias|/„(x)( = - |^ 11 -(x) + 2b dj , 2 j—^(x) + b di22 — |(x)| + o(/z n + 2/z 12 + h 22 ) 


and 


Var j/„(x)j - Var j/„(x)} as n —> 00 . 


Thus, the asymptotic expression of the MISE of f n on T 2 (H) ' 7 is given by 


AMISE- 


(7) - 

f ' 

If 

w - 


2 1 


d 2 f 


(x) + 2 fcr 


d 2 f 


d2 f, 


e < 12 dx 2 dx 2 {x) + b9l22 dx 2 2 ix) j 


+ ~ ||sSe( x ,H)|| 2 /(x)j(ix. 


The modified beta-Sarmanov kernel also depend on the choice of scalars (XjQijj) for 
j = 1,2. The user can set the values of ctjQijj ) according to his practical objective. 
For example in univariate case, Chen (1999, 2000) took aj(hjj) = 2 hjj. From (3.7) to 
(3.16) and when h 12 = 0, we have the same formulas for multiple associated kernels 
of Bouerzmarni and Rombouts (2010). Also, similar results can be obtained for 
the Scott bandwidth matrices. However, numerical illustrations are so long and 
tedious tasks; see, e.g., Hirukawa and Sakudo (2014) for d — 1. 


25 


























4. Simulation studies and real data analysis 

In this section, we compare the performance of the three forms of bandwidth 
matrices of Table 2.1. The optimal bandwidth matrix is chosen by LSCV method 
(2.25) using the algorithms Al, A2 and A3 given at the end of Section 3.3.1 and 
their indications. All computations were done on the computational resource 1 of 
Laboratoire de Mathematiques de Besancon by using the R software; see R De¬ 
velopment Core Team (2012). The comparisons will be done using the standard 
version of the beta-Sarmanov kernel estimator (3.11) through simulations studies 
and an illustration on real dataset. 


4.1. Simulation studies 

We consider six target densities with supports included in [0,1] X [0,1] and 
labeled A, B, C, D, E and F respectively. They have different correlation structure 
and some local modes. The plots for these densities are given in Figure 4.1. 


• Density A is the bivariate beta density without correlation p = 0 such that 
(pi,qi) - (3,3) and (p 2 ,^ 2 ) = (5,5) as parameters values in univariate beta 
density (3.1), respectively; 

• density B is the bivariate Dirichlet density 


f{vi,v 2 ) 


Y{pC\ + U 2 + OL 3 ) 

r(ai)r(« 2 )r(«3) 


v 


a\-l 

1 


V 


« 2“1 

2 


(i-v 1 -v 2 r~\ 


U\, V 2 >0,V\+V2< 1 , 


where T(-) is the classical gamma function, with parameters values a.\ = « 2 = 
2, a 3 = 7 and, therefore, the moderate value of p = -(ni« 2 ) 1/2 («i + « 3 ) _1/2 (a 2 + 
a 3 )- 1/2 = - 0 . 2222 ; 


• density C is the bivariate beta density without correlation p = 0 such that 
(Pi,qi) = (3,2) and (p 2 ,^ 2 ) = (2,5) in (3.1), respectively; 

• density D is the bivariate Dirichlet as the density B but with a 3 = « 2 = 10, 
« 3 = 3 and then p = -0.7692. 

• density E is the bivariate density without correlation p = 0 defined as follows: 


f(vi,v 2 ) = [(3/7)#!(zq) + (4l7)g 1 (v 1 )\ x g 3 {v 2 ) 


such that gi, g 2 and g 3 are univariate beta densities (3.1) with parameters 
values (pi,cji) = (2,7) and {p 2 ,qi) = (7,2) and ( p 3r c] 3 ) = (6,6). 


Dell Poweredge R900, Processor Xeon X7350, 2.93 GHz, 32 Go RAM 
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• density F is the bivariate density without correlation p = 0: 

f(v lf v 2 ) = [(8/11)^) + (3/ll)g 2 (^i)] x [(5/7)g 3 (v 2 ) + (2 /7)g 4 (v 2 )\ 

such that g\, g 2 , g 3 and g 4 are univariate beta densities (3.1) with parameters 
values (pi,qi) = (3.5,7) and (p 2r q 2 ) = (7,3.5), (p 3 ,q 3 ) = (7,2) and (p 4/ q 4 ) = 
(2,7). 

Table 4.1 presents the execution times needed for computing the LSCV method 
for each of the three types of bandwidth matrix with respect to only one replication 
of sample sizes n = 100 and 500 for each of the target densities A, B, C, D, E and 
F of Figure 4.1. For n = 100 the computational times of the LSCV method for 
full bandwidth matrix are longer than both the Scott and diagonal bandwidth 
matrices, which have the almost identical Central Processing Unit (CPU) times. 
The constraints (3.9) of the correlation of Sarmanov induce that execution times of 
the Scott bandwidth matrix are relatively a bit longer than for the diagonal ones. 
Let us note that for full bandwidth matrices, the execution times become very large 
when the number of observations is large as seen in Table 4.1; however, these CPU 
times can be considerably reduced by parallelism processing, in particular for for 
the full LSCV method. These constraints (3.9) reflect the difficulty of finding the 
appropriate bandwidth matrix with correlation structure by LSCV method (2.25). 
Flence, we are able to infer that the Scott and diagonal LSCV method do not impose 
excessive computation burdens; and, the Scott procedure takes into account the 
structure of (null and moderate) correlations in the sample. 


n H 

A 

B 

C 

D 

E 

F 

Full 

2.8310 

2.7570 

2.8793 

2.7931 

27940 

2.8362 

100 Scott 

0.2173 

0.2251 

0.2207 

0.2202 

0.2212 

0.2204 

Diagonal 

0.1436 

0.1456 

0.1526 

0.1536 

1.1531 

1.1446 


Table 4.1: Typical CPU times (in hours) for one replication of LSCV method (2.25) 
by using the algorithms Al, A2 and A3 given at the end of Section 3.3.1. 


We now examine the efficiency of various bandwidth matrices in Table 2.1 via 


,SE = N. 


El-if j/„(x)-/(x)) 2 <fx. 


where N = 100 is the number of replications. Table 4.2 shows some expected values 
of ISE for the three forms of bandwidth matrices with respect to the densities A, 
B, C, D, E and F of Figure 4.1, and according only to the sample size n = 100 
because of excess of computational times (see Table 4.1). Globally, the full and 
Scott bandwidth matrices with correlation structure perform better in terms of 
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0.0 0 0 


1 0 0.0 


(A) 


(B) 



Figure 4.1: Six plots of density functions defined at the begining of Section 4.1 and 
considered for simulations. 
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the quality of smoothing than the diagonal one without correlation. Even if the 
correlation is almost non-existent in the sample (e.g. models A and C of Table 4.2), 
we attend the good behavior of the full and Scott bandwidth matrices. Also, these 
bandwidth matrices with correlation structure suit for multimodal target densities 
(e.g. E and F of Table 4.2). For moderate correlation (e.g. models B of Table 4.2) we 
can recommend the light version of bandwidth matrices with correlation structure 
which is the Scott one. As for strong correlation in the sample (e.g. models D of 
Table 4.2) it is not preferable to use the Scott bandwidth matrix. 


Models 

Full 

Scott 

Diagonal 

A 

0.2554(0.2253) 

0.1887(0.0890) 

0.2963(0.2121) 

B 

0.8571(0.2422) 

0.5999(0.2582) 

0.9743(0.5973) 

C 

0.1985(0.0737) 

0.2150(0.0828) 

0.2384(0.1740) 

D 

1.1481(0.0937) 

9.1188(0.7466) 

1.6420(0.5735) 

E 

0.2786(0.0785) 

0.2498(0.0985) 

0.5056(0.1526) 

F 

0.6025(0.1122) 

0.6791(0.1475) 

0.8025(0.1254) 


Table 4.2: Expected values (and their standard deviation) of ISE with N = 100 
replications of sample sizes n = 100 using three types of bandwidth matrices H 
(2.25) for each of four models of Figure 4.1. 


Finally, Tables 4.1 and 4.2 indicate that the choice of the Scott bandwidth matrix 
using ESC V method is a good alternative to the full one in the purpose of preserving 
a correlation structure in the bandwidth matrix for an associated kernel estimator. 

4.2. Real data analysis 

We applied the standard version of the beta-Sarmanov estimator (3.11) on 
paired rates data set according to the three types of bandwidth matrices in Table 2.1. 
The dataset of sample size n = 80 in Graph (o) of Figure 4.3 has been provided by 
Francial G. Fibengue (2013) during his last stay in Burkina Faso. It represents the 
popular ratings, for the two first consecutive ballots of the same electoral mandate 
of five years, of a political figure in different departments. Note that, for both 
elections, the prominent politician was finally elected in the second round. We 
are here interested to the opposite behavior of peoples during first rounds of both 
elections since the second round is governed by political alliances, which do not 
constitute a reference for the own popularity of a candidate. 

Indeed, for many African countries, political elections are generally fought on 
tribal ethnic origins and partisan interests. The data displays opposed viewpoints 
between the results of the first round of the first election (x\) and the first round of 
the second one {x 2 ). In fact, the first election saw the candidate program mostly 
adopted by its clan (tribe and allied tribes) and rejected by the others. A few years 
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later, facing the social discontent, he assumed a new political program which 
ends to another consultation of the people in a time x 2 - This reversal made him 
loose the support of his supporters but he received the backing up of formers 
opponents. It is thus noticeable that its own popularity is not much different 
between the first round of both elections; the first gave an average X\ = 0.4915 and 
the second x 2 = 0.4126. However, there is a significant negative correlation in the 
dataset: p~ = -0.6949. Unlike overall trends X\, x 2 and p~, the empirical distribution 
Graph (o) of Figure 4.3 gives more details of the electoral situation department by 
department. Hence, we need a nonparametric smoothing of this joint distribution 
by using associated kernels. 

In order to smooth the joint empirical distribution of these paired data, we 
apply the beta-Sarmanov kernel estimator in its standard version (3.11). Figure 4.2 
shows the results of the LSCV algorithms Al, A2 and A3 with the ratings dataset; 
see (2.25) and the end of Section 3.3.1. The computation time of the LSCV is in the 
same trend as in Table 4.1 for n = 100. To simplify the presentations in Figure 4.2 
for both the Scott and full bandwidth matrices, we only plot h 12 i—> LSCV(H) for 
some values of hy and h 22 . In all cases we observe that there is a global minimum. 
The obtained optimal bandwidth matrices are 



Figure 4.2: Some plots of H i—» LSCV(H) from algorithms at the end of Section 3.3.1 
for real data in Graph (o) of Figure 4.3: (Al) full H, (A2) Scott H and (A3) diagonal 

H. 


- _ /0.160000 0.000655) 

Hfu11 “ (0.000655 0.057500 j' 

/0.076039 0.000622) _/0.106836 0.000874) 
(0.000622 0.073994j ~~ (0.000874 0.103962j 
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and Hdiagonai = Diag (0.160000,0.057500). The resulting estimates are displayed in 
Figure 4.3. 

From simulation studies of previous section, the full bandwidth matrix pro¬ 
vides the reference smoothing which is appropriated for correlated data; see Graph 
(a) of Figure 4.3. In record time, the Scott bandwidth matrix Hs cot t delivers similar 
smoothing in Graph (b) of Figure 4.3 as the full and diagonal ones (see respectively 
Graphs (a) and (c) of Figure 4.3). In conclusion, we found anywhere the shape of 
a "carpet flying" in balance, smoothing thus the joint empirical distribution of the 
electoral situation of the candidate. This balance situation makes him to win in 
the second round of both elections. 


5. Summary and final remarks 

We have presented general associated kernels (with or without correlation 
structure) that varying their shape according to the target point along the support. 
Excluding the classical associated kernels, the local adaptability of these associated 
kernels (depending on the target point x and the bandwidth matrix H) means that 
they are free of boundary bias but have a slightly bias different. Furthermore, the 
forms of bandwidth matrices used in the case with correlation structure have both 
theoretical and practical significances. Under the criterion of cross-validation, we 
therefore recommend the Scott bandwidth matrix which is more workable than 
the full one. A method of construction, called multivariate mode dispersion method, 
for these kernels are introduced. Also, we have proposed an algorithm of bias- 
reductions of their corresponding associated kernel estimators. An extension to 
discrete multivariate associated kernels is obviously possible. Similarly, a work is 
in progress on nonparametric multiple regression composed by continuous and 
discrete univariate associated kernels (e.g. Proposition 2.6). 

Constructed by the correlation structure of Sarmanov (1966) and by a vari¬ 
ant of mode dispersion method, the bivariate beta-Sarmanov kernel estimator is 
completely study with the optimal bandwidth matrix chosen by cross-validation. 
This technique can be extended in multivariate case for different type of kernels 
which are continuous and also discrete. In fact, from two or more univariate inde¬ 
pendent pdfs or probability mass functions, the correlation structure of Sarmanov 
(1966) and a variant of mode dispersion method can always allow to build a mul¬ 
tivariate type of kernel with correlation structure; and, therefore, produced the 
corresponding multivariate Sarmanov kernel. In terms of execution times from 
using the cross-validation method, we advise to use the Scott bandwidth matrix 
because of its flexibility and efficiency for no very strong correlation in the dataset. 

Simulation experiments and analysis of a real dataset provide insight into the 
behavior of the bandwidth matrix for small and moderate sample sizes. Tables 4.1 
and 4.2 and Figure 4.3 can be conceptually summarized as follows. As expected. 
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Figure 4.3: Graphical representations (o) of the real dataset with n = 80, X\ = 
0.4915, x 2 = 0.4126, hi = 0.2757, = 0.2720, p~ = -0.6949 and its corresponding 

smoothings according to the choice of bandwidth matrix H by LSCV: (a) full, (b) 
Scott and (c) diagonal. 




the full bandwidth matrix is frequently better than the others. An alternative with 
correlation structure has been proposed for the cross-validation technique: the 
Scott bandwidth matrix which has a comparable gain in execution times than the 
diagonal one; also, it produces better results than the diagonal in most cases. So, 
we recommend the Scott bandwidth matrix for multivariate use with the cross- 
validation technique. Further research in this direction are in progress, especially 
on the choice of optimal bandwidth matrix by using Bayesian approaches; see, 
e.g., Zougab et al. (2014). 


6. Appendix 


This section is dedicated to the proof of Proposition 2.11. Indeed, using suc¬ 
cessively (2.16) and Taylor's formula around E (Z 0 (x,hj) arid then x, and also the 
invariance under cyclic permutations of the operator trace, the result (2.18) is 
shown by 

Bias j/„(x)) = E {/ (Z 0 ( X ,H))) - /(*) 

= /(e (Ze( x ,H))) + ^ |trace j(Z 0 ( x ,H) - E (Z 0 ( X/ H))) V 2 / (e (Z 0 ( X/ h))) 

(-3e(x,H) - E (Zfl( X/ H)))}] - /(x) + o [e|(Z 0(x ,h) - E (Z 0 ( X ,H))) {Ze( x ,H) ~ E (Z 0 ( X/ H)))| 

= /(E (Z 0 ( X ,H))) ~ f( x ) + 2 traCe ( E (-^ 0 (x,H))) e{(Z 0(X/ H) - E (Z 0 ( X/ H))) 

(Ze(x,H) ~ E (Z 0 ( X ,H)))}] + o [trace [e |(Z 0 ( x ,h) - E (Z 0 ( X/ h))) [Ze(x,n) - E (Z 0 ( X/ h))) | j| 

= / (E (Z 0(x ,H))) - /(x) + ^ trace [v 2 / (e (Z 0(x ,h))) Cov (Z 0 ( X/ h))] + o [trace [Cov (Z 0 ( X/ H))}] 

= / (x + a 0 (x, H)) - /(x) + ^ trace [v 2 / (x + a 0 (x, H)) B 0 (x, H)] + o [trace {B 0 (x, H)}] 

= aJ(x,H)V/(x) + ^ trace [{a 0 (x,H)ag(x,H) + B 0 (x,H)} V 2 /(x)] + o {trace (h 2 )} . 

In fact, the rest o [trace (H 2 )} comes from trace(B 0 (x, H)) = O (trace H 2 ) deduced 
from Proposition 2.4 of classical associated kernels and 


o |E |(Z 0 (x,H) - IE (Z 0 (x,H))) (-Z^ 0 ( X/ H) - IE (Z 0 (X,H))) 

E lOp (Zd(x, H) - E (Z 0 (x,H))) {Z* 0 ( X/ H) - E (Z 0 (x,H))) 1 / 


where o p (-) is the probability rate of convergence. 
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Concerning the variance (2.19) one first has 

Var (AM) = h |k^ h) (x,)j - \ [e (x,)jf 

= l f ^ fcH) (u)/(u)«iu - \ [K (x efcH) (X ,))] 2 

'-'Safx.HjOTrf 

- h ~ h- 

From (2.16) and (2.18), one has the following behavior of the second term 

I 2 := (1 In) [E {k 0(x , h) (Xi)}] 2 - (1 /n)f 2 (x) - O (1/n) 

since / is bounded for all x G T c f. By using Taylor's expansion around of x, the first 
term J, := (1/n) ^ K^ (x H) (u)/(u)rfu becomes 

I 1 = l/(x)f K 2 g (u)A. + R(x,H), 

^ ds fl( x, H) nT rf 

with 

R (x, H) = - f K 2 g(x H) (u) [(u - x) T V/(x) + )- (u - x) T V 2 /(x) (u - x) 

n ds e(x ,H)nT rf ' L z 

+ o {(u - x) T (u - x)}J du. 

A similar argument from Chen (1999, Lemma) with / bounded on T d shows the 
existence of r 2 and then the condition ||L W ( x ;h)|| 2 < c 2 (x)(detH) _ ' 2 leads successively 
to 


0 s R < X ' H > S f c 2 (x)((u-x) t V/(x)+Lu-x) t V 2 /(x)(u-x) 

n (,aet tij - J Se(x , H) nT rf ^ z 

- o {n _i (detH) _, ' 2 | .■ 
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